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ESTIMATION OF CONTEXTUAL EFFECTS THROUGH NONLINEAR MULTILEVEL LATENT 
VARIABLE MODELING WITH A METROPOLIS-HASTINGS ROBBINS-MONRO ALGORITHM 


Abstract 


The main purpose of this study is to improve estimation efficiency in obtaining maxi- 
mum marginal likelihood estimates of contextual effects in the framework of nonlinear 
multilevel latent variable model by adopting the Metropolis-Hastings Robbins-Monro 
algorithm (MH-RM; Cai, 2008, 2010a, 2010b). Results indicate that the MH-RM algo- 
rithm can produce estimates and standard errors efficiently. Simulations, with various 
sampling and measurement structure conditions, were conducted to obtain information 
about the performance of nonlinear multilevel latent variable modeling compared to 
traditional hierarchical linear modeling. Results suggest that nonlinear multilevel la- 
tent variable modeling can more properly estimate and detect contextual effects than 
the traditional approach. As an empirical illustration, data from the Programme for 
International Student Assessment (PISA; OECD, 2000) were analyzed. 


Keywords: contextual effect, multilevel modeling, latent variable modeling, multilevel 
latent variable modeling 
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1 Introduction 

In social science research, a contextual effect is traditionally defined as the differ- 
ence between two coefficients in a hierarchial linear model (HLM) analysis framework 
(Raudenbush & Bryk, 1986; Willms, 1986; Lee & Bryk, 1989; Raudenbush & Willms, 
1995): one from the individual-level and the other coefficient from the group-level. A 
representative application of this kind of contextual effect in education was discussed in 
Raudenbush and Bryk (2002) using a subset of High School and Beyond (HS&B) data. 
In this example, individual math achievement is regressed on individual-level socioeco- 
nomic status (SES) and school-level math achievement is regressed on aggregated school- 
level SES using multilevel modeling. The result shows that two coefficient estimates are 
not the same, indicating two students who have the same SES level are expected to have 
different levels of math achievement depending on to which school a student belongs. 
Statistically significant difference between these two coefficients represents a significant 
compositional effect. Though hierarchical linear modeling opened the door to estimating 
contextual effects, there have been two unresolved problems. The first one is related to 
the attenuated coefficient estimates due to measurement error in predictors (Spearman, 
1904), and the other is biased parameter estimates due to sampling error associated with 
aggregating level-1 variables to form level-2 variables by simply averaging the values 
Raudenbush & Bryk, 2002, Ch. 3). 

To handle measurement error and sampling error more properly, multilevel latent vari- 
able modeling has been suggested as an alternative to traditional methods (e.g. Liidtke et 
al., 2008; Ltidtke, Marsh, Robitzsch, & Trautwein, 2011; Marsh et al., 2009). Ltidtke et al. 
(2008) proposed a multilevel latent variable modeling framework for contextual analy- 
sis. Ltidtke et al. (2008)’s simulation study is noteworthy in that the study examined the 
relative bias in contextual effect estimates when the traditional HLM is used under dif- 
ferent data conditions. The results showed that the relative percentage bias of contextual 


effect was less than 10% across varying data conditions when a multilevel latent variable 
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model was used. On the other hand, the relative percentage bias of contextual effect was 
up to 80% when the traditional HLM was used. However, the traditional HLM can yield 
less than 10% relative bias under favorable data conditions - that is, when level-1 and 
level-2 units exceed 30 and 500, respectively, and when there is substantial intra-class 
correlation (ICC) in the predictor (e.g., 0.3). However, the type of manifest variables is 
limited to continuous only in Liidtke et al.’s (2008) study. 

Marsh et al. (2009) conducted another noted study using multilevel latent variable 
modeling for contextual effect analysis. Marsh and colleagues compared several contex- 
tual modeling options related to ’big fish-little-pond effect (BFLPE)” estimation using an 
empirical data set in which academic achievement (predictor) and academic self-concept 
(outcome) were measured by, respectively, three and four continuous manifest variables. 
Among the tested models, a multilevel latent variable model yielded the largest BFLPE 
estimate. The authors described this model as a doubly latent variable contextual model. 
Such a model is theoretically the most desirable choice for researchers, since the model 
tries to take both measurement and sampling error into account by utilizing information 
from all the manifest variables, rather than using summed or averaged scores at both 
individual- and group-level. The study also illustrated how the nonlinear multilevel 
latent variable modeling approach can provide flexibility in modeling by including ran- 
dom slopes, latent (within-level or cross-level) interactions, and latent quadratic effects. 
In both Ltidtke et al.’s (2008) and Marsh et al.’s (2009) studies, they utilized continuous 
manifest variables, while the current study considers categorical indicators (item-level 
data) for all latent variables in the model. 

While theoretically desirable, nonlinear multilevel latent variable modeling poses sig- 
nificant computational difficulties. Standard approaches such as numerical integration 
(e.g., adaptive quadrature) based EM or Markov chain Monte Carlo (MCMC, e.g., Gibbs 
Sampling) based estimation methods have important limitations that make them less 


practical for routine use. With respect to numerical integration, its computational bur- 
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den increases exponentially when the dimensionality of latent variable space is high, as 
is the case with the current nonlinear multilevel latent variable model. On the other 
hand, while MCMC is entirely free from the curse of multidimensionality, it is not im- 
mune from issues that include advanced tuning requirements, specification of priors, 
and convergence analysis for complex models. Ltidtke et al. (2011) also reported the oc- 
currence of unstable estimates. The model has difficulty converging when small sample 
size is combined with low intraclass correlation coefficient (ICC) in predictors, and also 
when there are substantial amount of missing observations in the manifest variables. 
The main objective of this study is to develop a more efficient and stable estima- 
tion method for contextual effects in the nonlinear multilevel latent variable modeling 
framework, by adopting the Metropolis-Hastings Robbins-Monro algorithm (MH-RM; 
Cai, 2008, 2010a, 2010b). This study significantly extends the applications of MH-RM 
algorithm to the case of multilevel modeling. Prior research using MH-RM is limited 
to single level applications, e.g., exploratory and confirmatory item factor analysis (Cai, 
2010a, 2010b), latent regression modeling (von Davier & Sinharay, 2010), and item re- 
sponse theory modeling with non-normal latent variables (Monroe & Cai, 2014). 
Computational efficiency and parameter recovery were assessed in a comparison 
with an implementation of EM algorithm using adaptive Gauss-Hermite quadrature 
(Mplus; Muthén & Muthén, 2008). Another objective was to find, through a simula- 
tion study, the extent to which measurement error and sampling error can influence 
contextual effect estimates under different conditions. The results can provide practical 
rationales for the application of computationally demanding nonlinear multilevel latent 
variable models. The last objective of this study was to provide an empirical illustration 
of estimating contextual effects by applying nonlinear multilevel latent variable models 
to empirical data that contain complex measurement structures and unbalanced data. A 
subset of data from Programme for International Student Assessment (PISA; Adams & 


Wu, 2002) were analyzed to illustrate a contextual effect model. 
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2 Contextual Effects in a Nonlinear Multilevel Latent Variable Model 

The particular contextual effect of interest in the current study is one that occurs 
when a group-level characteristic is measured by individual-level variables, and the 
individual-level variables are in turn measured by categorical manifest variables. This 
study considers a contextual effect as a compositional effect that captures the influence 
of contextual variables on individual-level outcomes, controlling for the effect of the 
individual-level predictor. 
2.1 Structural Models 


In traditional HLM, a compositional effect 6, can be defined as follows: 


Yi = Boj + Brj(Xij — Xj) + ri, 
Boj = Yoo t+ Yo1(Xj — X..) + Ugj, 
Bij = Yi0, 


Be = 01 — Y10- (1) 


In Equation (1), Y;; and X;; denote the outcome and predictor values of individual 7 in 
level-2 unit j, respectively. For the level-1 equation, the predictor values are centered 
around the group means Xj. For the level-2 model, the predictor values are centered 
around the grand mean X... 

In typical educational research settings, Y;; and X;; can be constructed by summing or 
averaging item scores from self-reports or other instruments. The random effects rj; and 
gj are assumed to be normally distributed with zero means and variances o* and 70, 
respectively. In this particular definition of a contextual effect as a compositional effect, 
the slope 19 is the same across the level-2 units (a fixed effect). 

In a nonlinear multilevel latent variable model, the predictors and outcomes become 
latent variables that are denoted as y;; and ¢;;. Those latent variables are connected 


to manifest variables through measurement models. For notational simplicity, latent 
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individual deviations from latent group means can be defined as 6;; = ¢ij — ¢j, and 
group mean deviations from the latent grand mean can be defined as 6; = ¢ ; —¢.. Then 


the latent variable counterpart to Equation (1) is: 


Hii = Boj + Brjoij + ij, 
Boj = Yoo + 019; + Uj, 
Bij = 10, 


Bc = Yo1— Y10 (2) 


Note that we have centered the latent level-1 predictor values around the group means, 
and the latent level-2 predictor values around the grant mean, maintaining comparability 
with Equation (1). Similarly, the random effects r;; and uo; are assumed to be normally 
distributed with zero means and variances o* and 1 0, respectively. 

For identification purposes, we impose the restriction of ¢ = 0 to fix the location 
of the predictor latent variable in the model. This implies that 6; = ¢; and the level-1 
latent predictor value is expressed as a group mean plus a deviation term ¢;; = ¢j + jj. 
To identify the location of the outcome latent variable, we set the intercept Yoo to 0 
as well. To identify the scale of the latent variables, we impose additional restrictions 
on 6; and rj. These are disturbance terms, so they should have zero means and as 
is customary in other item response theory modeling situations, we set their variances 
to unity, ie., var(dj;) = 1 and o? = 1. This particular identification constraint leaves 
open the possibility to estimate the variance of ¢j, which will be denoted y, as well 
as the variance of uj, which is To. We also make the regression model specification 
assumption that the deviation ¢ ; and the random effect uo; are statistically independent. 
2.2 Measurement Models 

The measurement models define the relationship between manifest variables and la- 


tent variables. For brevity, only the measurement models of the latent predictor variable 
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¢ij will be described in this section, since the measurement models for the latent outcome 
yi; can be defined analogously. 

When manifest variables are ordinal response variables with multiple categories (in- 
cluding 0-1 responses), as is often the case with instruments used in educational re- 
search, a logistic version of Samejima (1969)’s classical graded response model can be 
utilized. Let item / have K; ordered categories. The conditional cumulative probability 


for a response in category k € {0,1,...,K; — 1} and above are defined as follows: 


To(gij) = L 
1 
Ti (€ij) = 14 exp| (cy) aij)” 
Tk,-1(G4) - 


1 +exp[—(cx,-11 + agi) ]’ 


where C1],...,C€ K-11 tepresent a vector of K; — 1 item intercept parameters, and a, is the 
item slope. The category response probability is defined as the difference between two 


adjacent cumulative probabilities: 


Pri) = Tr(a) — Tear (Ca), (4) 


for k € {0,1,...,K; — 1}, where Tx,(¢i;) = 0 
Let Xj € {0,1,...,K; —1} be a random variable representing the ith individual's 
response in the jth level-2 unit to the /th item, and let x; be a realization of X;;). Condi- 


tional on ¢;;, the distribution of X;;) is multinomial with trial size 1 in K; categories: 


fo(xijleij) = in P(E mel i), (5) 


where X;(x;j)) is an indicator function which equals 1 if and only if x;;) is equal to k, and 
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0 otherwise. Note that missing observations are handled naturally in this conditional 
multinomial formulation. If x;;) is a missing data point, the indicator function is always 
0, and hence only observed responses contribute to the measurement of ¢;j. 

The conditional density f(x;j)|¢i;) is indexed by 0, which is our generic notation for a 
vector of all free parameters in the model that includes the item intercepts and slopes, the 
fixed effects (701, 10), and the variance components (To9 and ip). Let x;j = (xis, eas XijL,)’ 
be a L, x 1 vector of item responses from individual 7 in level-2 unit j to the L; items 
measuring ¢;;. Invoking the critically important assumption of conditional independence 


of item responses given the latent variable, we may write 


fo(xiilGii) = [Li xilCii) = folxilC j, Oy), (6) 


where the last equality follows from the fact that ¢;; = ¢j + 4j. 
2.3 Observed and Complete Data Likelihoods 

Similar to the case of ¢;;, let us consider the measurement of 77;;. Let Ly be the number 
of manifest variables for 4;;._ Again under conditional independence, the conditional 


response probabilities factor into item response probabilities: 


fo (yijlnij) =F] Yijl ij), (7) 


where y;; is the Ly x 1 vector of item responses from individual 7 in level-2 unit j to the 


outcome measures. Recall from Equation (2) that 


ij = Boj + P1joij + ij = Yoo + ‘Yo1Sj + Uoj + 109i) + Ti: 


We note that given fixed effects, if we knew the random effect u;, the latent group mean 
¢j, the latent deviation term 6;;, and the equation disturbance term 1;j;, 4;; would be 


completely determined. This implies that we may rewrite the conditional distribution of 
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Yij as: 


fol yijlnii) = folvisl? j- Uoj, 5i-Ti)- (8) 


If we integrate r;; out of Equation (8), we have left 


folyislé.j, Uoj, Oi) = | fol vile i406, ria) f (rig )d (Tig), (9) 


where f(r;j) is the density of a standard normal random variable, given preceding as- 
sumptions about the disturbance term. Bringing in results from Equation (6) and in- 
tegrating out 6;; yields a conditional density that depends only on the level-2 latent 


variables and random effects 


fol Yij, xif|Cj,Uoj) = | folie, bi) fol ViilE jr Uoj, Oij) f (Oi) 4 (Oi), (10) 


where f(0;;) is the density of a standard normal random variable. Equation (10) makes 
it clear that we assume, under correct model specification, the outcome measures (y;;) 
and predictor measures (x;;) are conditionally independent. 

Let J and J; stand for the number of level-2 units and number of individuals in 
level-2 unit j. Let Yj; = {yi} , and X; = {xi}j , represent the collected responses 
to the outcome manifest variables and predictor manifest variables, respectively, from 
all individuals in level-2 unit 7. We now make the critical assumption of conditional 
independence again - that the individuals are independent conditionally on the level-2 
latent variables/random effects ¢ ; and uo;. Thus the conditional joint density of Y; and 


Xj becomes: 
Tj 
fo(¥;, XjI¢ j, Ug;) oo [ [foi xilCj, Ugj)- (11) 
i=1 


Integrating out the level-2 latent variables and random effects yields the marginal prob- 
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ability, wherein we have utilized the independence of ¢ j and ug: 


YX) = ff it Yj, XZ joj) FG) f (toy) (Sj) (oy) (12) 


By this point we have integrated all latent variables and random effects out of the 
joint probabilities. We now make the routine multilevel modeling assumption that the 
level-2 units are the independent sampling units. Upon observing Y; and X; and treating 


them as fixed, the marginal (observed data) likelihood function for the entire sample is 
L(@|Y,X) I] fo(¥ (13) 


where Y = {Y;} = and X = {X;} Sy collect together the full set of outcome and predic- 
tor observed variable responses, respectively. Directly maximizing this marginal likeli- 
hood function over 6 would lead to the maximum marginal likelihood estimator of the 
structural parameters. 

An obvious computational limitation to the direct marginal likelihood approach is the 
integration involved in arriving at the observed data likelihood. All of the integrals must 
be approximated numerically, which can be computationally challenging. An alternative 
stance is to treat the random effects and latent variables rj;, 6;;, ¢.;, and uo; as missing 
data. This leads to a missing data formulation of the latent variable model. Had the 


missing data been observed, the complete data likelihood function can be written as 


i 
L(6|Y,X, Z) = Tit Vislo.j Uoj, Oi Ti) folXilS j, bij) FG if (rij) fo(uo;) folS. Cj), (14) 


fN(e 


i J 
where Z collects together all the level-1 random effects/latent variables { {rj bij} | Sa 
J= 
as well as those at level-2 {u9;,¢.;} a In other words, Z represents the “missing data.” 


This missing data formulation prompts us to consider an alternative estimation ap- 
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proach that eschews numerical integration. In particular, the missing data may be “filled 
in” by drawing imputations from their posterior predictive distribution f(Z|Y,X,6). 
Note that in our case the posterior predictive distribution is proportional to the com- 
plete data likelihood, greatly facilitating the use of MCMC sampling methods to draw 
from the posterior. The imputations lead to complete data sets, and the complete data 
likelihood function is much easier to handle than the observed data likelihood function 
due to its completely factored form. Instead of directly solving the observed data op- 
timization problem, a sequence of complete data optimizations can iteratively improve 
the parameters estimates until convergence. 

3 Metropolis-Hastings Robbins-Monro Algorithm for Contextual Models 

3.1 Metropolis-Hastings Robbins-Monro Algorithm 

The MH-RM algorithm was initially proposed by Cai (2008) for nonlinear latent 
structure analysis with a comprehensive measurement model, and the application of 
the algorithm has been expanded to other measurement and statistical models (e.g. 
Cai, 2010a; Cai, 2010b; Monroe & Cai, 2014). The MH-RM algorithm combines the 
Metropolis-Hastings (MH; Hastings, 1970; Metropolis, Rosenbluth, Rosenbluth, Teller, & 
Teller, 1953) algorithm and the Robbins-Monro (RM; Robbins & Monro, 1951) stochastic 
approximation algorithm. 

Utilizing the missing data formation of the latent variable model, the random effects 
and latent variables are treated as missing data. Once the missing data are “filled in” 
by the MH sampler, complete data likelihoods can be optimized iteratively. Because 
imputation noise is introduced in the MH step, the RM algorithm is used to filer out 
the noise. Let the parameter estimate at iteration t be denoted 0), the (t + 1)th itera- 
tion of the MH-RM algorithm consists of three steps: Stochastic Imputation, Stochastic 
Approximation, and Robbins-Monro Update. 

Step 1. Stochastic Imputation 


Draw M; sets of missing data, which are the random effects and latent variables, from a 


Estimation of Contextual Effects 12 


Markov chain that has the posterior predictive distribution of missing data f(Z|Y,X, 0“)) 


as the target. Then, M; sets of complete data are formed as follows: 
{Y, xX, ZY in = 1, Me} . (15) 


Step 2. Stochastic Approximation 
Let 
s(0IYX, Zh) = Flog L(0lW,x, Zh") (16) 


denote the gradient vector of the complete data log-likelihood function, evaluated at the 
current parameter value 0! and missing data imputation Zo. We first compute the 


sample average of gradients of the complete data log-likelihood: 
1< (+1) 
S41 = ) 8(0lY,X, Zn”). (17) 
M: > 


By Fisher’s Identity (Fisher, 1925), the conditional expectation of the complete data gra- 
dient vector over the posterior distribution of the missing data is the same as the gradient 
vector of the observed data log-likelihood, under mild regularity conditions, i.e., 

) ) 

< log L(8|¥,X) = | ws L(OlY, X, Z) f (ZY, X, @)aZ. (18) 
In other words, though noise-corrupted, 8,1 gives the direction of likelihood ascent 
because it is a Monte Carlo approximation of the conditional expected complete data 
gradient vector (the right hand side of Equation 18), which is also an approximation of 
the observed data gradient vector (the left hand side of Equation 18). 

Step 3. Robbins-Monro Update 


To improve stability and speed, we also compute a recursive approximation of the con- 
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ditional expectation of the information matrix of the complete data log-likelihood: 


Mt 


1 
Ti Tees | ae y° H(eM|y,x,z0t)) —r |, (19) 
m=1 
where 
ay 


is the complete data information matrix, i.e., the negative second derivative matrix of 


the complete data log-likelihood. Updated parameters are computed recursively: 
a) = 0) + 6 (8141), (20) 


where {€;;t > 0} is a sequence of gain constants (to be elaborated). 

The iterations are started from initial values 6°) and a positive definite matrix To. 
They can be terminated when the changes in parameter estimates are sufficiently small. 
As a practical method for convergence check, Cai (2008) proposed to monitor a ”win- 
dow” of the largest absolute differences between two adjacent iterations. Cai (2008) 
suggested 3 as a reasonable width of the window to be monitored in practice. Cai (2008) 
showed that the MH-RM iterations converge to a local maximum of the observed data 
likelihood L(6/Y, X) with probability one as t increases without bounds. 

The gain constant €; is a sequence of decreasing non-negative real numbers such that 
er € (0,1), VRoer = o, and VP2y)e7 < oo. In practical implementations of MH-RM, 
the starting parameter values 6) are often sufficiently far away from the mode of the 
marginal likelihood that extra care must be taken with the gain constant sequence so that 
MH-RM does not terminate prematurely. We typically implement a 3-stage procedure 
wherein the first M1 iterations of MH-RM uses non-decreasing gain constants to quickly 
move the provisional estimates to a vicinity of the final solution. The next M2 iterations 


use the same non-decreasing gain constants but the estimates are averaged to start the 
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final MH-RM iterations with decreasing gain constants. For the last stage, the sequence 
of gain constants is taken to be €; = 0.1/(f +1)°7° as some experimentation. 
3.2. Approximating the Observed Information Matrix 

One of the benefits of using the MH-RM algorithm is that the observed data infor- 
mation matrix can be approximated as a byproduct of the iterations. The inverse of 
the observed data information matrix becomes the large-sample covariance matrix of 
parameter estimates. The square root of the diagonal elements are the standard errors. 


Utilizing Fishier’s Identity, the gradient vector is approximated recursively, 
S41 = 81+ €¢{8141 — St}, (21) 


where 8; is defined as Equation (17). A Monte Carlo estimate of the conditional expec- 
tation of the complete data information matrix minus the conditional covariance of the 


complete data gradient vector is defined as follows: 


HOO ly, x 20D) — sal, x, 20) say, x, ZV] 2. %@2) 
j=l 


A more stable estimate can be found by further recursive approximation: 
Gi =G;+ e{Giyy = G:}. (23) 
Finally, the observed information matrix is approximated as 
Titi = Giri t+ $4184.41. (24) 


Cai (2010a) discussed the rationale behind this approximation as a recursive applica- 
tion of Louis’s (1982) formula. The main benefit is that the information matrix becomes a 


by-product of the MH-RM iterations. Another practical option for approximating the ob- 
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served information matrix is a direct application of Louis’s (1982) formula, in which the 
gradient vector and the conditional expectations are approximated directly after con- 
verge of the MH-RM algorithm using additional Monte Carlo samples. In this study, 
standard errors obtained by the first method are called recursively approximated standard 
errors and those from the latter are called post-convergence approximated standard errors. 

4 Simulation Studies 

4.1 Simulation Study 1: Comparison of Estimation Algorithms 

4.1.1 Methods 

The first study examined parameter recovery and standard errors across two algo- 
rithms, MH-RM algorithm and an existing EM algorithm. The data-generating and fitted 
models followed Equation (2). The simulated data are balanced in that the number of 
level-2 units (ng) is 100 and the number of level-1 units per group (np) is 20. The gener- 
ating ICC value for the latent predictor was 0.3. 

For the measurement model, five dichotomously scored manifest variables were gen- 
erated for each latent trait (i-e., 7, and ¢) using the graded model in Equation (3). For 7;;, 
the manifest variables are Y1, Y2, Y3, Y4, and Ys. For ¢;;, which is the sum the level-2 latent 
group mean and the deviation terms (¢ ; + 4;;), the manifest variables are X1, Xz, X3, X4, 
and Xs. The item parameters were the same across levels, representing cross-level mea- 
surement invariance. 

We attempted 100 Monte Carlo replications. The first 10 data sets were analyzed 
using two methods: an MH-RM algorithm implemented in R (R Core Team, 2012) and 
an adaptive quadrature based EM approach implemented in Mplus (Muthén & Muthén, 
2010). The MH-RM algorithm’s convergence criterion was 5.0 x 10~°, and the maximum 
number of iterations for the first two stages of MH-RM with constant gains were M1 = 
100 and M2 = 500. To calculated post-convergence approximated standard errors, 100 
to 500 additional random samples were used. All replications converged within 600 


MH-RM iterations with decreasing gains. 
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4.1.2 Results 

The generating values and the corresponding estimates for the compositional effect 
from different algorithms are summarized in Table 1. The first column contains the true 
parameters for the measurement and structural parameters. The second set of columns 
and the third set of columns include the estimates and SEs from EM with different num- 
bers of adaptive quadrature points (qp=5 and qp=14). The default number of quadrature 
points is 15 in Mplus, but the computer cannot handle 15 quadrature points for this four- 
dimensional model. The maximum possible number of quadrature points was 14 for a 
compositional effect model. A smaller number of quadrature points (5) was tested to 
compare point estimates and standard errors. The fourth set of columns includes the 
corresponding point estimates and standard errors using the MH-RM algorithm. 

The means of point estimates from different algorithms are generally very close to 
one another. For structural parameter estimates, the number of quadrature points does 
not appear to make a large difference, though 14-quadrature-point estimates are slightly 
closer to the MH-RM estimates and the generating values in terms of To9 and . Standard 
errors are also very similar. 

For measurement parameter estimates, both the means of point estimates and the 
standard errors were the same up to the second decimal place across different numbers 
of quadrature points. The largest difference in average point estimates between EM and 
MH-RM was 0.02, indicating that the two approaches yield highly similar estimates. 
However, mean standard error estimates are slightly different between MH-RM and EM 
results in that the standard error estimates from MH-RM algorithm for intercepts are 
smaller than those from the EM algorithm. The biggest difference in standard error 
estimates for measurement parameters between two algorithms was 0.13. 

The natural logarithm of standard error estimates from EM algorithm, MH-RM algo- 
rithm (post-convergence approximated standard errors) are plotted against the natural 


logarithm of empirical standard deviations of point estimates across the Monte Carlo 
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replications in Figure 1. The estimates are clustered along the diagonal reference line, 
indicating that the estimated standard errors are generally close to the Monte Carlo 
standard deviations of the point estimates, except for the intercept parameter standard 
errors, which appear to be underestimated when the post-convergence approximation is 
used for the MH-RM algorithm. 

With regards to computing time, when one processor was used for estimation, EM 
with 5 quadrature points generally required a small amount of time, while EM with 
14 quadrature points generally required over an hour. The MH-RM algorithm required 
about 40 minutes. Note that MH-RM is implemented in R (an interpreted language) 
with explicit looping, while Mplus is written in FORTRAN (a compiled language). As 
an interpreted language is expected to be several orders of magnitude slower compared 
to a compiled language in terms of looping, a direct comparison is inappropriate. What 
we can safely conclude is that when ported into a compiled language, MH-RM is poised 
to be substantially more efficient. 

To examine the performance of the MH-RM algorithm further, all 100 generated data 
sets were analyzed, and the results are summarized in Table 2. The means of point esti- 
mates are reasonably close to generating values in general, with slight underestimation 
of variance components. The Monte Carlo standard deviations of parameter estimates 
(column 5) are also similar to standard error estimates from both EM and MH-RM (col- 
umn 4 and 6); the largest difference is 0.02. With respect to measurement parameters, 
the average item parameter estimates are very close to generating values. 

However, we see that recursively approximated standard errors are generally closer 
to the Monte Carlo standard deviations of item parameter estimates than the post- 
convergence approximated standard errors. More specifically, the most prominent differ- 
ences are found in the standard errors of intercept parameters, where post-convergence 
approximated standard errors for item intercept parameters are underestimated. There- 


fore, we find that recursively approximated standard errors perform better than post- 
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convergence approximated standard errors. With that said, a drawback of using recur- 
sively approximated standard errors is the requirement of a relatively larger number 
of main MH-RM iterations (at least 1000 in our experience) to reach a positive definite 
approximate observed information matrix. For this reason, post-convergence approx- 
imated standard errors are adopted for the remaining simulations in this study since 
this approach gives proper standard error estimates for structural parameters and can 
be faster. 

Finally, 95% confidence intervals for each parameter were calculated. The post- 
convergence approximated standard errors were used to form these two-sided Wald- 
type confidence intervals. The percentages of intervals that cover the generating values 
are reported in the last column of Table 2. Based on the 100 replications performed, 
coverage of structural parameters appears well calibrated in general. For measurement 
parameters, the coverage rates tend to decrease as the magnitude of parameters becomes 
larger. Coverage rates are at the lowest for the more extreme intercept parameters due 
to their underestimated standard errors. 

4.2 Simulation Study 2: Comparison of Models 

The second simulation study was conducted to examine how measurement error and 
sampling error may influence compositional effect estimation across different conditions 
with both a traditional HLM and a multilevel latent variable model. 

4.2.1 Simulation Conditions 

A total of 30 data generating conditions were examined: 2 compositional effect sizes, 
x 3 sampling conditions x 2 ICC sizes x 2 measurement condition + 6 conditions for a 
model with no compositional effect. 

First, two different sizes of compositional effect were considered in this study. The 
generating value of yo; was 1.0. The generating value of 19 was either 0.5 or 0.8, giv- 
ing a compositional effect of 0.5 or 0.2, respectively. Second, the combination of large 


(ng=100, np=20) and small (ng=25, np=5) numbers of groups and individuals makes a 
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total of 4 different sampling conditions. However, the combination of 25 groups and 
group size of 5 leads to too small a total sample size (125), which is not entirely appro- 
priate for the stable estimation of a high-dimensional latent variable model. Therefore, 
only three different sampling conditions were used for this simulation study. For latent 
predictor ICC levels, 0.1 and 0.3 were used to generate small- and a large-ICC conditions 
by manipulating y — the variance of ¢;. Finally, two different measurement structures 
were considered. The observed variables in the first condition were dichotomous and 
in the second condition, they were 5-category ordinal responses. The true item param- 
eters are given in in Table 3. Additionally, data were generated from a model with no 
compositional effect (7o1 = 10) with the first measurement condition and analyzed to 
examine empirical Type I error rates for the compositional effect estimates with both 
the traditional model and the latent variable model. In each condition, 100 Monte Carlo 
replications were attempted. 
4.2.2 Analysis 

Because all simulated data sets have the true generating values of 7;; and ¢;;, these 
values (true scores) can be analyzed using a traditional model. The resulting param- 
eter estimates can be considered gold standard estimates that are influenced only by 
sampling fluctuations but not by measurement conditions. Therefore, each data set has 
three sets of parameter estimates: 1) estimates from analyzing the generating values of 
yij and ¢;; with a traditional HLM, which is treated as the gold standard, 2) estimates 
obtained by applying latent variable model, and 3) the estimates from analyzing the 
observed summed scores of outcomes and predictors with the standard approach us- 
ing manifest variables. All of the traditional HLM analyses were conducted using an R 
package nlme (Pinheiro, Bates, DebRoy, Sarkar, & R Core Team, 2012). 
4.2.3 Evaluation Statistics 

To compare these three sets of estimates, three statistics are calculated: 1) the percent- 


age bias of the estimate relative to the magnitude of its generating value, 2) the observed 
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coverage rate of the 95% confident interval, and 3) the observed power to detect the 
compositional effect of interest as significant. 

It should be noted that the regression coefficient estimates from the observed sum 
score analysis using a traditional multilevel model are not on the same scales as those 
obtained using the latent variable approach, which yields naturally standardized fixed 
effects coefficients due to the identification conditions discussed earlier. To make the co- 
efficient estimates more comparable, the estimates from the traditional HLM approach 
were standardized by multiplying the parameter estimates by the ratio of standard de- 
viation of the predictor to the standard deviation of the outcome. 

4.2.4 Results 

Convergence rates and mean computing time across generating data conditions are 
reported in Table 4. Only converged replications were used to calculate evaluation statis- 
tics. The worst cases of non-convergence occur when the number of level-2 units is low 
and the ICC is small. This is particularly true for the second measurement condition 
when a substantially larger number of item parameters for the multiple-categorical items 
must be estimated from the data. 

Let us examine the first measurement condition where all items are dichotomously 
scored. Because a compositional effect estimate is defined as the difference between ¥o1 
and 19, those two parameter estimates are examined together, along with the compo- 
sitional effect estimate (the difference) itself. Relative percentage biases in Yo; and ¥j9 
are summarized in Figure 2. When the generating values of 4;; and ¢;; were analyzed, 
the bias of Yo, ranged from 1% to 15% across the sampling conditions. Latent variable 
modeling resulted in a similar magnitude of bias. But traditional HLM resulted in more 
substantial bias in both ¥o; and 419 (from 30% to 70%) (see the gray bars in Figure 2). 

The biases in the traditional HLM estimates of the regression coefficients lead to an 
interesting pattern of biases in the compositional effect estimate. The bias can be as 


small as 8% when the predictor ICC is large and the sampling condition favorable (more 
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individuals in each group), but the bias can be as large as 80% when the ICC is small 
and the group size is small (see Figure 3). It is also noteworthy that the bias in the 
compositional effect estimate from the traditional HLM model can also be positive when 
the ICC is large and the contextual effect size is small (see the last plot of Figure 3). 

On the other hand, comparing the two plots in Figure 4 with the first two plots in 
Figure 3 reveals that the performance of the traditional HLM and the latent variable 
model in terms of estimating 01, Yio, aS well as the compositional effect, is highly 
similar across the two measurement conditions. This indicates the measurement model 
is a less influential source of bias in this study. 

To examine the standard error estimates, the coverage rates of the 95% confidence 
intervals for the true compositional effect were calculated. Results from the condition 
with large true compositional effect and the first measurement condition are summarized 
in Figure 5. When generating values are analyzed, the coverage rates across sampling 
conditions are generally close to 95%, except for the case where ICC is small and the 
number of groups is also small. In this case the coverage rate can be as low as 85%. The 
coverage rates based on the latent variable model parameter estimates were similar or 
slightly worse than those from generating value analysis. Coverage rates with traditional 
HLM estimates can be problematic when the number of individuals per group and the 
ICC are both low. 

To examine how researchers can make different inferential decisions when they apply 
a traditional model and a latent variable model, the empirical Type I error rates are cal- 
culated for the conditions where the true data generating model has zero compositional 
effect. Figure 6 shows empirical Type I error rates across ICC and sampling conditions 
for the first measurement condition. 

Generating value analysis yields Type I error rates of .05 to .07 across sampling con- 
ditions. The latent variable model maintains similar Type I error rate calibration, except 


for the cases when the number of individuals per group is small. For traditional HLM 
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analysis, Type I error rate inflation is dramatic. Only under the conditions when a small 
predictor ICC is coupled with a small number of group or a small number of individuals 
per group, does the traditional method maintains proper Type I error rate. 

Turning to statistical power, when a compositional effect is large (see Figure 7), gen- 
erating value analysis yields power of about .85 when ICC is large and the number of 
groups is also large. When ICC is small, power decreases to .35 even with favorable 
sampling conditions. The lowest statistical power (.15) is found when predictor ICC is 
small and the number of groups is also small. 

The patterns are similar for the latent variable analysis. But when ICC is small, and 
the number of individuals per group or the number of groups is small, latent variable 
modeling actually yields a slightly higher percentage of significant compositional ef- 
fects. While the traditional HLM analysis yields a very high percentage of significant 
compositional effects when the ICC is large and the number of individuals per group is 
also large, the power decreases remarkably when ICC is small and when the sampling 
condition deteriorates (i.e., when the number of individuals per group or the number of 
groups is small). Also, the relatively high statistical power associated with the traditional 
HLM analysis is partially attributable to the inflated Type I error rates observed earlier - 
the test is liberal overall. 

In summary, relative bias of ¥o1 are 419 are large when the traditional HLM is applied. 
This is consistent with findings from previous research. However, the relative bias in the 
difference between the two coefficients (the compositional effect estimate) can sometimes 
be kept at bay, since both coefficients can be biased in the same direction. We note that 
the true compositional effect can be estimated with traditional methods when ICC is 
large and the sampling condition is favorable. However, Type I error rates are severely 
inflated under this very condition, when the true compositional effect is zero. Thus this 
model can frequently make the false claim that there is a significant compositional effect 


even when there is none. 


Estimation of Contextual Effects 23 


On the other hand, biases in point estimates seems rather unavoidable when the 
sampling condition is not favorable (small group sizes and low sample size in general), 
and especially when ICC is also low. Even with generating true scores the estimates show 
some biases. However, the latent variable model tends to yield less biased estimates in 
general. When ICC is small and the number of individuals per group is small, the Type 
I error rate associated with the latent variable compositional effect estimate increases 
slightly, but the magnitude of the elevation is still much more preferable compared to 
the traditional HLM analysis. We also find that the main issue with the latent variable 
model approach in terms of sampling conditions is related more to small number of 
groups rather than to the number of individuals per group. As long as the number of 
groups sampled is sufficiently large, the performance of the latent variable modeling 
approach can be satisfactory. 

Finally, we find that the measurement structure to be less influential in this study. It 
certainly may be due to the particularly set of item parameters chosen. The results from 
the second measurement condition, however, indicate that the estimation of too many 
item parameters with limited sample size can possibly undermine the performance of 
the latent variable modeling approach. 

5 Empirical Application: ’Big-fish-little-pond” Effect 
5.1 Data 

For this compositional effect demonstration, a subset of publicly available data from 
The Programme for International Student Assessment (PISA 2000; OECD, 2000) were ex- 
tracted and analyzed. PISA is a large international comparative survey. A large amount 
of student and school level information that covering cognitive and affective domains 
was collected with a complex sampling scheme. 

Though 42 countries participated in the data collection, a sample of students from 
the US was analyzed in this study for the purpose of illustration only. Originally, a total 


of 129 reading items were administered to estimate country level reading literacy using 
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a balanced incomplete block design. However, for simplicity, only booklets 8 and 9 were 
used for this analysis. These two booklets included 33 reading items, but 1 item was 
dropped prior to our analysis because all item responses to this item were scored as 
incorrect, which meant that the item contributes no information. Therefore, the analysis 
data set contained responses to 32 reading items (3 ordinal items with 3 categories each 
and 29 dichotomous items) from 667 students nested within 141 schools in the US. The 
number of students within a school ranged from 1 to 8 in this analysis data set. The 
outcome variable is the students’ self concept in reading. It was measured by three items 
(CC02005, CC02009, and CC02Q23). Each item has a Likert-type scale, ranging from 1 
(disagree) to 4 (agree). 

5.2 Results 

The structural parameter estimates from the multilevel latent variable analysis (EM 
algorithm and the MH-RM algorithm) and traditional HLM analysis are summarized 
in Table 5. In general, a positive and significant within-school coefficient 419 is found 
across different models and algorithms. The between-school coefficient estimate (491) 
was not statistically significantly different from 0 when the multilevel latent model was 
applied (with EM or MH-RM algorithm), while the estimate was significantly different 
from 0 when the traditional HLM was applied. 

The compositional “big-fish-little-pond” effect is calculated by subtracting 419 from 
¥Yo1. The direction of the compositional was negative. This is consistent with reports 
from previous research (Marsh et al., 2009). It indicates that two students who have the 
same levels of reading achievement can have different level of academic self-concept, 
depending on school-level academic achievement. As the compositional effect is nega- 
tive, the students who attends a higher achieving school tend to have lower academic 
self-concept when compared with a student who attends a lower achieving school. On 
the other hand, a student who belongs to a lower achieving school is expected to have 


higher academic self-concept when compared with a student who belongs to a higher 
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achieving school — just like a fish that feels big if the pond in which it lives is small. 

However, in terms of the statistical significance of the compositional effect, the effect 
is not significantly different from 0 if we use the estimates and standard errors from the 
traditional HLM; but if we use the latent variable estimates, the compositional effect is 
significant. This result is consistent with what we found via the simulation study in that 
the power of the latent variable model to detect a compositional effect is higher than 
that of the traditional method, when the data set is associated with a sufficiently large 
number of schools and a small number of students per school. 

Finally, the item parameter estimates from the MH-RM algorithm are plotted against 
those from the EM algorithm in Figure 8. As can be seen, the estimates are very 
close. Standard errors of the item parameters exhibited a similar pattern as found previ- 
ously (see Figure 9), confirming that the post-convergence approximation method yields 
slightly smaller standard errors, while the recursive approximation tends to yield larger 
standard errors. 

6 Conclusion 

This study is situated in a current stream of research (e.g., Goldstein & Browne, 
2004; Goldstein, Bonnet, & Rocher, 2007; Kamata, Bauer, & Miyazaki, 2008) that tries 
to develop a comprehensive, unified model that benefits from both multilevel modeling 
and latent variable modeling by combining multidimensional IRT, factor analytic mea- 
surement modeling, and the flexibility of nonlinear structural equation modeling in a 
multilevel setting. Considering that one of the pressing needs in developing a unified 
model is an efficient estimation method, the current study contributes to nonlinear mul- 
tilevel latent variable modeling by extending an alternative estimation algorithm. The 
principles of MH-RM algorithm and previous applications (Cai, 2008) suggest that the 
algorithm can be more efficient than the existing algorithms when a model contains a 
large number of latent variables or random effects. 


The primary purpose of this study was to improve estimation efficiency in obtaining 
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maximum likelihood estimates of contextual effects by adopting the MH-RM algorithm 
(Cai, 2008, 2010a, 2010b). R programs implementing the MH-RM algorithm were pro- 
duced to fit nonlinear multilevel latent variable models. Computation efficiency and 
parameter recovery were assessed by comparing results with an EM algorithm that uses 
adaptive Gauss-Hermite quadrature. Results indicate that the MH-RM algorithm can ob- 
tain maximum likelihood estimates and their standard errors efficiently. Considering the 
difference between an interpreted language (R) and a compiled language (FORTRAN) 
in which EM is implemented, substantial improvement in efficiency is expected if the 
MH-RM estimation code is ported to a compiled language in the future. 

The second purpose of this study was to provide information about the performance 
of the nonlinear multilevel latent variable model in comparison to traditional HLM 
through a simulation study that covers various sampling and measurement conditions. 
Results suggest that nonlinear multilevel latent variable modeling can more properly 
estimate and detect a contextual effect than the traditional approach in most conditions. 
Type I error rates of the compositional effect estimate from the traditional model can also 
be substantially elevated whereas latent variable modeling leads to more proper Type I 
error rate calibration. 

The third purpose of this study was to provide an empirical illustration using a 
subset of data extracted from PISA (Adams & Wu, 2002). A negative compositional 
effect was found for the relationship between reading literacy and academic self-concept, 
supporting the results from previous studies, on the “Big-fish-little-pond” effect (e.g. 
Marsh et al., 2009). The compositional effect was statistically significant at the .05 level 
when the nonlinear multilevel latent variable model was applied. On the other hand, 
the traditional HLM approach could not detect a statistically significant effect. 

This study is limited several important ways. The latent variable model itself contains 
a series of strong specification and distributional assumptions. These assumptions re- 


quire careful checking in empirical settings because the violations of these assumptions 
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can lead to substantial unknown estimation biases. The simulation study only exam- 
ined a limited set of conditions with fixed item and structural parameters. The data 
generating and fitted models in the simulation study also do not contain any model 
specification error. More complex structural models should also be considered. In fu- 
ture research, an obvious extension of the model discussed here is one that includes 


cross-level interactions in latent variables. 
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Figure 1: Comparisons of standard errors for item parameters. 
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Figure 2: Relative percentage bias in ‘0 (first two plots) and 419 (last two plots), large 
true compositional effect, measurement condition 1, by the sampling conditions (number 


of individuals in each group, and number of groups). 
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Figure 3: Relative percentage bias in compositional effect estimate Yo, — Yi0, large true 
compositional effect (first two plots) and small true compositional effect (last two plots), 
first measurement condition, by the sampling conditions (number of individuals in each 


group, and number of groups). 
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Figure 4: Relative percentage bias in compositional effect estimate Yo, — ¥i0, large true 
compositional effect, second measurement condition, by the sampling conditions (num- 


ber of individuals in each group, and number of groups). 
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Figure 5: 95% compositional effect estimate confidence interval coverage rate, large true 
compositional effect, first measurement condition, by the sampling conditions (number 


of individuals in each group, and number of groups). 
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Figure 6: Empirical Type I error rates for the compositional effect estimate, first mea- 


surement condition. 
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Figure 7: Percentage of significant compositional effect (estimated power), small true 
compositional effect (first two plots) and large true compositional effect (last two plots), 


first measurement condition. 
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Figure 8: Item parameter estimates based on the EM and MH-RM algorithms for the 
PISA 2000 USA data analysis. 
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Figure 9: Standard errors of item parameters based on the EM and MH-RM algorithms 
for PISA 2000 USA data analysis. Method 1 uses recursively approximated standard 


errors. Method 2 uses post-convergence approximated standard errors. 
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Table 1: Generating values, EM estimates, and MH-RM estimates for a compositional 


effect model 


Structural Parameters 


EM (Sqp) EM (14qp) MH-RM 
@ (0) E{se(6)} E(6) Efse(6)} (6) Ef{se(6)} 
YO1 1.00 1.02 0.19 1.01 0.19 1.00 0.18 
10 0.50 0.52 0.05 0.51 0.05 0.52 0.09 
TO 1.00 0.90 0.16 0.91 0.17 0.93 0.16 
yp 0.43 0.40 0.07 0.42 0.07 0.42 0.07 
Measurement Parameters 
Ax 0.80 0.79 0.07 0.79 0.07 0.79 0.08 
Ax2 1.00 1.01 0.08 1.01 0.08 1.00 0.09 
Ax3 1.20 1.24 0.09 1.24 0.09 1.24 0.11 
AxA 1.40 1.39 0.10 1.39 0.10 1.39 0.12 
Ax5 1.60 1.67 0.14 1.67 0.14 1.69 0.15 
ay) 0.80 0.78 0.06 0.78 0.06 0.78 0.06 
Ay2 1.00 1.00 0.07 1.00 0.07 1.00 0.07 
Ay3 1.20 = 1.23 0.09 1.23 0.09 1.23 0.08 
Ay 1.40 1.40 0.11 1.40 0.11 1.40 0.10 
Ay5 1.60 1.61 0.13 1.61 0.13 1.60 0.12 
Cx1 -0.80 -0.75 0.08 -0.75 0.08 -0.75 0.06 
Cx2 0.00 0.02 0.08 0.02 0.08 0.02 0.05 
Cx3 1.20 = 1.30 0.11 1.30 0.11 1.29 0.08 
Cyd -0.70 -0.61 0.11 -0.61 0.11 -0.62 0.07 
Cs 0.80 0.92 0.14 0.92 0.14 0.92 0.08 
Cyt -0.80 -0.80 0.11 -0.80 0.11 -0.81 0.06 
Cy2 0.00 0.01 0.13 0.01 0.13 0.00 0.05 
Cy3 1.20 1.19 0.16 1.19 0.16 1.18 0.08 
Cy4 -0.70 -0.74 0.18 -0.74 0.18 -0.75 0.07 
Cys 0.80 0.79 G21. 0.79 0.21 0.78 0.08 
Computational Efficiency 
one processor 5~7 min 60~100min 35~40min 


Note. @ = Generating values; E(6) = mean of point estimates; E{se(6)} 
= mean of estimated SEs (post-convergence approximated SEs); a = 


item slope parameters; c = item threshold parameters. 
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Table 2: Generating values and MH-RM estimates for a compositional effect model 


Structural Parameters 
@  E(8) E{se1(@)} SD(8) E{se2(0)} 95% CI Coverage Using sel 


yo. 1.00 0.99 0.17 0.19 0.18 95.0 
Y10 0.50 0.50 0.06 0.07 0.09 95.0 
To 1.00 0.97 0.20 0.18 0.16 89.0 

py 0.43 0.43 0.08 0.09 0.07 89.0 

Measurement Parameters 
ay, 0.80 0.80 0.07 0.06 0.07 98.0 
ay2 1.00 1.01 0.10 0.09 0.09 91.0 
ay3 1.20 1.22 0.12 0.10 0.11 92.0 
ax, 140 1.40 0.12 0.10 0.13 84.0 
ay5 1.60 1.60 0.15 0.13 0.15 73.0 
ay, 0.80 0.80 0.07 0.07 0.06 95.0 
ay2 1.00 1.01 0.07 0.07 0.07 94.0 
ay3 1.20 1.21 0.10 0.09 0.09 86.0 
aya 140 1.39 0.10 0.09 0.10 89.0 
ay5 1.60 1.61 0.10 0.13 0.13 74.0 
Cy, 0.80 0.80 0.14 0.08 0.06 94.0 
Cy 0.00 0.00 0.07 0.09 0.05 95.0 
Cx3 -1.20) -1.22 0.09 0.12 0.08 91.0 
Cyg 0.70 0.69 0.12 0.11 0.07 89.0 
Cy5 -0.80 -0.80 0.12 0.15 0.08 89.0 
cy, 0.80 0.81 0.08 0.09 0.06 87.0 
cy2 0.00 0.01 0.11 0.11 0.06 78.0 
cyg -1.20 -1.20 0.13 0.13 0.08 75.0 
cya 0.70 0.71 0.15 0.15 0.07 62.0 
Cys -0.80  -0.79 0.14 0.18 0.08 59.0 
Computational Efficiency 
35~40min 90~120min 


Note. 6 = Generating values; E(#) = mean of point estimates; E{se1(6)} = mean 
of recursively approximated standard error estimates; E{se2(6)} = mean of post- 
convergence approximated standard errors; SD(6) = Monte Carlo standard de- 
viation of point estimates; 95% confidence interval coverage rate using post- 
convergence approximated standard errors; a = item slope parameters; c = item 
threshold parameters. 
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Table 3: Conditions of measurement models and generating values for item parameters 
Measurement Model 1 (MM 1) 


Condition Ci; indicators 71;; indicators 
X1~X5 (K; = 2) Y1IveY5 (Ky = 2) 

slope (a7) intercept (c;) 

pale all 0.8 -1.0 
X2, Y2 1.0 0.0 
X3, Y3 1.2 1.0 
x4, Y4 1.4 -0.5 
X5, Y5 1.6 0.5 

Measurement Model 2 (MM 2) 

Condition Ci; indicators 71;; indicators 
X1~X5 (K; = 5) Y1~Y5 (K; = 5) 

slope (a) intercepts (cy, C21, C31, C41) 

XLVI 0.8 -1.0, 0.0, 1.0, 2.0 
X2, Y2 1.0 -1.0, 0.0, 1.0, 2.0 
X3, Y3 1.2 -1.0, 0.0, 1.0, 2.0 
X4, Y4 1.4 -1.0, 0.0, 1.0, 2.0 
X5, Y5 1.6 -1.0, 0.0, 1.0, 2.0 


Table 4: Percentage of converged solution and average time per replication (in seconds) 


Large Compositional Effect = 0.5 
np=20 np=5 
ng=100 MM1 MM2 MM1 MM2 
ICC=0.1  100(2781) 89(4911) = 97(972) 81(1593) 
ICC=0.3 100(2657) 95(5301) 1001955) 95(1613) 
ng=25 MM1 MM2 MM1 MM2 
ICC=0.1 98(1046) 92(1522) N/A 
ICC=0.3. —-99(865) 93(1524) 
Small Compositional Effect = 0.2 
np=20 np=5 
ng=100 MM1 MM2 MM1 MM2 
ICC=0.1 97(2937) 91(5165)  95(1021) 92(1588) 
ICC=0.3 98(1785) 92(4910) 100(1046) 91(1593) 
ng=25 MM1 MM2 MM1 MM2 
ICC=0.1 = 95(919)  78(1521) N/A 
ICC=0.3 = 93(915) = 95(1519) 


Note. MM1 = Measurement model 1; MM2 = Measure- 
ment model 2; ng = number of groups; np = number 
of individuals per group. 
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Table 5: Structural parameter estimates from PISA 2000 USA data analysis 


Multilevel latent variable model Manifest variable HLM 
MH-RM EM EM 
Parameter 0 6 se(6) t-value 6 se(6) t-value 6 se(6) t-value 
10 0.42 0.06 7.17 0.42 0.05 7.92 0.11 0.01 7.75 
Yo1 0.16 0.11 1.43 0.18 0.11 1.68 0.07 0.02 3.60 
T00 0.47 0.11 0.39 0.47 0.11 4.28 0.37 - 190.31 
Uy 0.12 0.07 2.30 0.11 0.06 1.86 N/A N/A N/A 
BFLPE -0.27 0.13 -2.12 -0.24 0.12 -1.98 -0.04 0.02 -1.76 


Note. Reported standard errors for MH-RM algorithm are from recursively approxi- 
mated observed data information matrix. « The HLM software program produces a 
x? test for the variance component Tp. 


